Wuzzy is a decentralized web crawling and search system built on the AO (Actor Oriented) protocol. The system consists of two main components:
The Wuzzy system uses a distributed architecture where:
The Nest is the central hub that:
Crawlers are autonomous processes that:
Both components use an Access Control List (ACL) system with roles:
owner
: Full administrative accessadmin
: Administrative accessThe Nest provides the following handlers for document indexing, search, and crawler management.
Indexes a document in the search database.
Action: Index-Document
Required Roles: owner
, admin
, Index-Document
Parameters:
document-url
(string): The URL of the document to index, used as document-id
document-last-crawled-at
(string): The date
header from the relay device responsedocument-content-type
(string): MIME type of the documentdata
(string): The content of the documentdocument-title
(string, optional): Title of the documentdocument-description
(string, optional): Description/summary of the documentResponse:
Example:
Removes a document from the search index.
Action: Remove-Document
Required Roles: owner
, admin
, Remove-Document
Parameters:
document-id
(string): The ID of the document to remove, typically its URLResponse:
Searches the document index for matching content.
Action: Search
Required Roles: None (public)
Parameters:
query
(string): The search querysearch-type
(string, optional): Search algorithm to use (simple
or bm25
, defaults to simple
)Response:
Example:
Registers an existing crawler with the Nest.
Action: Add-Crawler
Required Roles: owner
, admin
, Add-Crawler
Parameters:
crawler-id
(string): The process ID of the crawler to addcrawler-name
(string, optional): Name for the crawler, defaults to "My Wuzzy Crawler"Response:
Removes a crawler from the Nest.
Action: Remove-Crawler
Required Roles: owner
, admin
, Remove-Crawler
Parameters:
crawler-id
(string): The process ID of the crawler to removeResponse:
The Crawler provides handlers for managing crawl tasks and processing web content.
Requests immediate crawling of a specific URL.
Action: Request-Crawl
Required Roles: owner
, admin
, Request-Crawl
Parameters:
url
(string): The URL to crawl immediatelyResponse:
Adds URLs to the crawl task queue.
Action: Add-Crawl-Tasks
Required Roles: owner
, admin
, Add-Crawl-Tasks
Parameters:
data
(string): Newline-separated list of URLs to crawlResponse:
Example:
Removes URLs from the crawl task queue.
Action: Remove-Crawl-Tasks
Required Roles: owner
, admin
, Remove-Crawl-Tasks
Parameters:
data
(string): Newline-separated list of URLs to removeResponse:
Configures which Nest the crawler should submit documents to.
Action: Set-Nest-Id
Required Roles: owner
, admin
, Set-Nest-Id
Parameters:
nest-id
(string): The process ID of the target NestResponse:
Triggers the crawler's scheduled processing cycle.
Action: Cron
Required Roles: owner
, admin
, Cron
Parameters: None
Behavior:
Note: This is typically called by a scheduler, not manually.
Both components include built-in state management handlers:
Both components support ACL management:
Updates user roles and permissions.
Action: Update-Roles
Required Roles: owner
, admin
Parameters:
Grant
and/or Revoke
operationsRetrieves current role assignments.
Action: Get-Roles
Required Roles: owner
, admin
, Get-Roles
http://
and https://
- Standard web protocolsarns://
- Arweave Name System URLsar://
- Direct Arweave transaction URLsarns://
- Arweave Name System URLsar://
- Direct Arweave transaction URLsNote: HTTP/HTTPS support may be limited in the Nest depending on configuration.
All handlers include comprehensive error checking and will respond with assertion errors if:
Errors are returned as standard AO error responses with descriptive messages.